131 research outputs found
Data Science and Ebola
Data Science---Today, everybody and everything produces data. People produce
large amounts of data in social networks and in commercial transactions.
Medical, corporate, and government databases continue to grow. Sensors continue
to get cheaper and are increasingly connected, creating an Internet of Things,
and generating even more data. In every discipline, large, diverse, and rich
data sets are emerging, from astrophysics, to the life sciences, to the
behavioral sciences, to finance and commerce, to the humanities and to the
arts. In every discipline people want to organize, analyze, optimize and
understand their data to answer questions and to deepen insights. The science
that is transforming this ocean of data into a sea of knowledge is called data
science. This lecture will discuss how data science has changed the way in
which one of the most visible challenges to public health is handled, the 2014
Ebola outbreak in West Africa.Comment: Inaugural lecture Leiden Universit
Crowd-Sourcing Fuzzy and Faceted Classification for Concept Search
Searching for concepts in science and technology is often a difficult task.
To facilitate concept search, different types of human-generated metadata have
been created to define the content of scientific and technical disclosures.
Classification schemes such as the International Patent Classification (IPC)
and MEDLINE's MeSH are structured and controlled, but require trained experts
and central management to restrict ambiguity (Mork, 2013). While unstructured
tags of folksonomies can be processed to produce a degree of structure
(Kalendar, 2010; Karampinas, 2012; Sarasua, 2012; Bragg, 2013) the freedom
enjoyed by the crowd typically results in less precision (Stock 2007).
Existing classification schemes suffer from inflexibility and ambiguity.
Since humans understand language, inference, implication, abstraction and hence
concepts better than computers, we propose to harness the collective wisdom of
the crowd. To do so, we propose a novel classification scheme that is
sufficiently intuitive for the crowd to use, yet powerful enough to facilitate
search by analogy, and flexible enough to deal with ambiguity. The system will
enhance existing classification information. Linking up with the semantic web
and computer intelligence, a Citizen Science effort (Good, 2013) would support
innovation by improving the quality of granted patents, reducing duplicitous
research, and stimulating problem-oriented solution design.
A prototype of our design is in preparation. A crowd-sourced fuzzy and
faceted classification scheme will allow for better concept search and improved
access to prior art in science and technology
Scaling Monte Carlo Tree Search on Intel Xeon Phi
Many algorithms have been parallelized successfully on the Intel Xeon Phi
coprocessor, especially those with regular, balanced, and predictable data
access patterns and instruction flows. Irregular and unbalanced algorithms are
harder to parallelize efficiently. They are, for instance, present in
artificial intelligence search algorithms such as Monte Carlo Tree Search
(MCTS). In this paper we study the scaling behavior of MCTS, on a highly
optimized real-world application, on real hardware. The Intel Xeon Phi allows
shared memory scaling studies up to 61 cores and 244 hardware threads. We
compare work-stealing (Cilk Plus and TBB) and work-sharing (FIFO scheduling)
approaches. Interestingly, we find that a straightforward thread pool with a
work-sharing FIFO queue shows the best performance. A crucial element for this
high performance is the controlling of the grain size, an approach that we call
Grain Size Controlled Parallel MCTS. Our subsequent comparing with the Xeon
CPUs shows an even more comprehensible distinction in performance between
different threading libraries. We achieve, to the best of our knowledge, the
fastest implementation of a parallel MCTS on the 61 core Intel Xeon Phi using a
real application (47 relative to a sequential run).Comment: 8 pages, 9 figure
- …